Add initial specification for Big Endian #470

djtodoro · 2025-07-01T14:19:37Z

GNU GCC Toolchain already supports big endian for RISC-V target. That support was merged without a change in psABI Document.
Here [0] is the initial PR for adding big endian support in LLVM project, so lets implement documentation part as well.

[0] llvm/llvm-project#146534

rui314 · 2025-07-01T23:27:18Z

I think we need to clarify which relocations write data in big-endian order when the output is big-endian. My understanding is as follows:

R_RISCV_{32,64}
R_RISCV_ADD{16,32,64}
R_RISCV_SUB{16,32,64}
R_RISCV_SET{16,32,64}
R_RISCV_32_PCREL
R_RISCV_PLT32

djtodoro · 2025-07-04T09:59:53Z

I think we need to clarify which relocations write data in big-endian order when the output is big-endian. My understanding is as follows:

R_RISCV_{32,64} R_RISCV_ADD{16,32,64} R_RISCV_SUB{16,32,64} R_RISCV_SET{16,32,64} R_RISCV_32_PCREL R_RISCV_PLT32

Hi @rui314, I agree, thanks a lot for pointing that out.

kito-cheng · 2025-08-12T03:07:19Z

@djtodoro do you have plan to update the PR according @rui314's comment?

djtodoro · 2025-08-12T08:10:33Z

@djtodoro do you have plan to update the PR according @rui314's comment?

Yes, sure.

jrtc27 · 2025-08-13T15:30:43Z

riscv-cc.adoc

+NOTE: Big-endian calling conventions follow the same rules as little-endian
+calling conventions. The only difference is in the byte ordering of multi-byte
+values in memory and registers. Register usage, argument passing, and return
+value conventions remain the same.


Does this address #265 (comment)?

It's a draft for that: we already pass 2×XLEN bits scalar in what we expected:

calars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

So I tried to add a paragraph to clarify also give an example for that, also adding a NOTE to describe the rationale.

The other thing I added is for Variadic arguments with 2×XLEN-bit and Aggregates with XLEN < size <= XLEN *2.

I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.

diff --git a/riscv-cc.adoc b/riscv-cc.adoc index 0768360..037b47f 100644 --- a/riscv-cc.adoc +++ b/riscv-cc.adoc @@ -191,6 +191,16 @@ available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack. +This register-pair ordering is defined in terms of value significance and is +independent of endianness. For example, on RV32BE a 64-bit scalar returned +in a0/a1 places bits [31:0] (the least-significant XLEN bits) in a0 and +bits [63:32] in a1; memory layout remains big-endian. + +NOTE: Defining the register-pair ordering independent of endianness allows +RV32_Zdinx and Zilsd paired load/store paths to be used directly for argument +passing and return without extra swaps. Memory layout remains governed by the +target endianness. + Scalars wider than 2×XLEN bits are passed by reference and are replaced in the argument list with the address. @@ -198,7 +208,10 @@ Aggregates whose total size is no more than XLEN bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2×XLEN bits are passed in a pair -of registers; if only one register is available, the first XLEN bits are passed +of registers with the fields laid out as though they were passed in memory: +the lower-numbered register holds the lower-addressed XLEN-sized chunk of +the aggregate and the higher-numbered register holds the next chunk; +if only one register is available, the first XLEN bits are passed in a register and the remaining bits are passed on the stack. If no registers are available, the aggregate is passed on the stack. Bits unused due to padding, and bits past the end of an aggregate whose size in bits is not @@ -231,7 +244,10 @@ same manner as named arguments, with one exception. Variadic arguments with even-numbered), or on the stack by value if none is available. After a variadic argument has been passed on the stack, all future arguments will also be passed on the stack (i.e. the last argument register may be left unused due -to the aligned register pair rule). +to the aligned register pair rule). For 2×XLEN scalars placed in an aligned +register pair, the lower-numbered register holds the least-significant XLEN bits +and the higher-numbered register holds the most-significant XLEN bits, +regardless of endianness. Values are returned in the same manner as a first named argument of the same type would be passed. If such an argument would have been passed by

riscv-elf.adoc

kito-cheng · 2025-09-05T09:22:08Z

riscv-cc.adoc

+NOTE: Big-endian calling conventions follow the same rules as little-endian
+calling conventions. The only difference is in the byte ordering of multi-byte
+values in memory and registers. Register usage, argument passing, and return
+value conventions remain the same.


It's a draft for that: we already pass 2×XLEN bits scalar in what we expected:

calars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

So I tried to add a paragraph to clarify also give an example for that, also adding a NOTE to describe the rationale.

The other thing I added is for Variadic arguments with 2×XLEN-bit and Aggregates with XLEN < size <= XLEN *2.

I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.

diff --git a/riscv-cc.adoc b/riscv-cc.adoc index 0768360..037b47f 100644 --- a/riscv-cc.adoc +++ b/riscv-cc.adoc @@ -191,6 +191,16 @@ available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack. +This register-pair ordering is defined in terms of value significance and is +independent of endianness. For example, on RV32BE a 64-bit scalar returned +in a0/a1 places bits [31:0] (the least-significant XLEN bits) in a0 and +bits [63:32] in a1; memory layout remains big-endian. + +NOTE: Defining the register-pair ordering independent of endianness allows +RV32_Zdinx and Zilsd paired load/store paths to be used directly for argument +passing and return without extra swaps. Memory layout remains governed by the +target endianness. + Scalars wider than 2×XLEN bits are passed by reference and are replaced in the argument list with the address. @@ -198,7 +208,10 @@ Aggregates whose total size is no more than XLEN bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2×XLEN bits are passed in a pair -of registers; if only one register is available, the first XLEN bits are passed +of registers with the fields laid out as though they were passed in memory: +the lower-numbered register holds the lower-addressed XLEN-sized chunk of +the aggregate and the higher-numbered register holds the next chunk; +if only one register is available, the first XLEN bits are passed in a register and the remaining bits are passed on the stack. If no registers are available, the aggregate is passed on the stack. Bits unused due to padding, and bits past the end of an aggregate whose size in bits is not @@ -231,7 +244,10 @@ same manner as named arguments, with one exception. Variadic arguments with even-numbered), or on the stack by value if none is available. After a variadic argument has been passed on the stack, all future arguments will also be passed on the stack (i.e. the last argument register may be left unused due -to the aligned register pair rule). +to the aligned register pair rule). For 2×XLEN scalars placed in an aligned +register pair, the lower-numbered register holds the least-significant XLEN bits +and the higher-numbered register holds the most-significant XLEN bits, +regardless of endianness. Values are returned in the same manner as a first named argument of the same type would be passed. If such an argument would have been passed by

djtodoro · 2025-09-05T09:30:03Z

@kito-cheng Thanks, I agree!

I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.

We will check GCC implementation, and fix it there.

riscv-elf.adoc

kito-cheng · 2025-09-10T14:08:24Z

@aswaterman could you take a look on the big-endian calling convention part :)

djtodoro · 2025-09-18T11:59:32Z

@kito-cheng Thanks, I agree!

I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.

We will check GCC implementation, and fix it there.

Okay. For this basic test case:

$ cat test.c
long long test()
{
  return 0x1;
}

GCC for LE generates:

$ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32
$ riscv64-unknown-linux-gnu-objdump -d test.o

test.o:     file format elf32-littleriscv


Disassembly of section .text:

00000000 <test>:
   0: 4505                 li a0,1
   2: 4581                 li a1,0
   4: 8082                 ret

And for BE, it generates:

$ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32 -mbig-endian
$ riscv64-unknown-linux-gnu-objdump -d test.o

test.o:     file format elf32-bigriscv


Disassembly of section .text:

00000000 <test>:
   0: 4585                 li a1,1
   2: 4501                 li a0,0
   4: 8082                 ret

So, basically it does not follow the proposal here. We managed to come up with a fix, but needs some extra testing (djtodoro/gcc@71a0f9f), but with that applied, we now have:

$ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32 -mbig-endian
$ riscv64-unknown-linux-gnu-objdump -d test.o

test.o:     file format elf32-bigriscv


Disassembly of section .text:

00000000 <test>:
   0: 4581                 li a1,0
   2: 4505                 li a0,1
   4: 8082                 ret

aswaterman · 2025-09-19T20:13:03Z

I haven't had time to think this through yet, but make sure whatever you propose does the right thing for variadic functions. In particular, you want the argument-register layout to match the memory layout of arguments passed on the stack. This might encourage you to stick with GCC's current implementation, rather than making the change that @djtodoro mentioned.

djtodoro · 2025-10-13T06:57:45Z

@aswaterman @kito-cheng Thanks for your comments!

I checked variadic functions and found that the current psABI proposal text needs adjustment to match the actual GCC implementation after our fix (djtodoro/gcc@71a0f9f).

Here is a small example:

$ cat variadic.c 
#include <stdarg.h>
 
volatile unsigned int SN[2];
volatile unsigned int SV[2];
volatile unsigned int SR[2];
 
__attribute__((noinline))
void consume_named(unsigned long long x) {
  SN[0] = (unsigned)x;
  SN[1] = (unsigned)(x >> 32);
}
 
__attribute__((noinline))
void consume_var(const char *tag, ...) {
  va_list ap; va_start(ap, tag);
  unsigned long long x = va_arg(ap, unsigned long long);
  SV[0] = (unsigned)x;
  SV[1] = (unsigned)(x >> 32);
  va_end(ap);
}
 
__attribute__((noinline))
unsigned long long ret64(void) {
  return 0x1122334455667788ULL;
}
 
int main(void) {
  consume_named(0x1122334455667788ULL);
 
  consume_var("p", 0x1122334455667788ULL);
 
  unsigned long long r = ret64();
  SR[0] = (unsigned)r;
  SR[1] = (unsigned)(r >> 32);
 
  return 0;
}

Compile it as (asm files in attachment):

# this does not include our proposed fix: https://github.com/djtodoro/gcc/commit/71a0f9fc4bf9ff1b92ac434e362261ed16ff396b
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_withoutfix_variadic.s
# with the fix
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_afterfix_variadic.s
# LE
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 variadic.c -o le_variadic.s

So, consume_named and return values are okay, it is what we implemented with the GCC fix. But the variadic case reveals an important distinction that needs to be documented. Let's investigate variadic arguments in consume_var. The 64-bit value is passed after the fixed string argument:
- a0 = pointer to "p"
- a2/a3 = the 64-bit value (aligned pair)

In BE (both before and after our GCC fix):
- Memory: [0x11223344][0x55667788] (big-endian order)
- Register assignment: a2 = 0x11223344 (MSW), a3 = 0x55667788 (LSW)
- Stack after spilling: offset 24 = 0x11223344 (MSW), offset 28 = 0x55667788 (LSW)
This maintains big-endian memory layout on the stack, which is essential for va_arg to work correctly.

So the issue is: The current psABI proposal states that variadic 2×XLEN scalars should use "the lower-numbered register holds the least-significant XLEN bits... regardless of endianness."
But this would break va_arg functionality on BE systems.

For the psABI, I propose we clarify the distinction:

Named args/returns on BE: a0=LSW, a1=MSW (significance-based ordering - stays as is)
Variadic args on BE: Use memory-layout ordering to maintain stack consistency

So, the proposal could be:

   Variadic arguments with
   even-numbered), or on the stack by value if none is available. After a
   variadic argument has been passed on the stack, all future arguments will also
   be passed on the stack (i.e. the last argument register may be left unused due
  -to the aligned register pair rule).  For 2×XLEN scalars placed in an aligned
  -register pair, the lower-numbered register holds the least-significant XLEN bits
  -and the higher-numbered register holds the most-significant XLEN bits,
  -regardless of endianness.
  +to the aligned register pair rule).  For 2×XLEN variadic scalars placed in an 
  +aligned register pair, the register assignment follows memory layout ordering:
  +the lower-numbered register receives the XLEN bits from the lower memory address
  +and the higher-numbered register receives the XLEN bits from the higher memory
  +address. This ensures correct va_arg operation when arguments are spilled to stack.
  +
  +NOTE: This memory-layout ordering for variadic arguments differs from the 
  +significance-based ordering used for named arguments and return values on 
  +big-endian systems.

Please let me know your thoughts about this.

big_afterfix_variadic.s.txt
big_withoutfix_variadic.s.txt
le_variadic.s.txt

aswaterman · 2025-10-14T00:13:52Z

That sounds plausibly correct to me, but @kito-cheng should sanity-check it.

aswaterman · 2025-10-14T00:14:44Z

Also, make sure to run through the GCC test suite with this scheme. Your simple test appears to catch the interesting case, but the test suite covers much more ground.

djtodoro · 2025-10-14T05:38:40Z

@aswaterman Thanks!

Also, make sure to run through the GCC test suite with this scheme. Your simple test appears to catch the interesting case, but the test suite covers much more ground.

Of course, I agree :)

djtodoro · 2025-10-29T14:24:49Z

@aswaterman @kito-cheng Thanks for your comments!

I checked variadic functions and found that the current psABI proposal text needs adjustment to match the actual GCC implementation after our fix (djtodoro/gcc@71a0f9f).

Here is a small example:
$ cat variadic.c 
#include <stdarg.h>
 
volatile unsigned int SN[2];
volatile unsigned int SV[2];
volatile unsigned int SR[2];
 
__attribute__((noinline))
void consume_named(unsigned long long x) {
  SN[0] = (unsigned)x;
  SN[1] = (unsigned)(x >> 32);
}
 
__attribute__((noinline))
void consume_var(const char *tag, ...) {
  va_list ap; va_start(ap, tag);
  unsigned long long x = va_arg(ap, unsigned long long);
  SV[0] = (unsigned)x;
  SV[1] = (unsigned)(x >> 32);
  va_end(ap);
}
 
__attribute__((noinline))
unsigned long long ret64(void) {
  return 0x1122334455667788ULL;
}
 
int main(void) {
  consume_named(0x1122334455667788ULL);
 
  consume_var("p", 0x1122334455667788ULL);
 
  unsigned long long r = ret64();
  SR[0] = (unsigned)r;
  SR[1] = (unsigned)(r >> 32);
 
  return 0;
}
Compile it as (asm files in attachment):
# this does not include our proposed fix: https://github.com/djtodoro/gcc/commit/71a0f9fc4bf9ff1b92ac434e362261ed16ff396b
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_withoutfix_variadic.s
# with the fix
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_afterfix_variadic.s
# LE
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 variadic.c -o le_variadic.s
So, consume_named and return values are okay, it is what we implemented with the GCC fix. But the variadic case reveals an important distinction that needs to be documented. Let's investigate variadic arguments in consume_var. The 64-bit value is passed after the fixed string argument: - a0 = pointer to "p" - a2/a3 = the 64-bit value (aligned pair)

In BE (both before and after our GCC fix): - Memory: [0x11223344][0x55667788] (big-endian order) - Register assignment: a2 = 0x11223344 (MSW), a3 = 0x55667788 (LSW) - Stack after spilling: offset 24 = 0x11223344 (MSW), offset 28 = 0x55667788 (LSW) This maintains big-endian memory layout on the stack, which is essential for va_arg to work correctly.

So the issue is: The current psABI proposal states that variadic 2×XLEN scalars should use "the lower-numbered register holds the least-significant XLEN bits... regardless of endianness." But this would break va_arg functionality on BE systems.

For the psABI, I propose we clarify the distinction:

Named args/returns on BE: a0=LSW, a1=MSW (significance-based ordering - stays as is)

Variadic args on BE: Use memory-layout ordering to maintain stack consistency

So, the proposal could be:
   Variadic arguments with
   even-numbered), or on the stack by value if none is available. After a
   variadic argument has been passed on the stack, all future arguments will also
   be passed on the stack (i.e. the last argument register may be left unused due
  -to the aligned register pair rule).  For 2×XLEN scalars placed in an aligned
  -register pair, the lower-numbered register holds the least-significant XLEN bits
  -and the higher-numbered register holds the most-significant XLEN bits,
  -regardless of endianness.
  +to the aligned register pair rule).  For 2×XLEN variadic scalars placed in an 
  +aligned register pair, the register assignment follows memory layout ordering:
  +the lower-numbered register receives the XLEN bits from the lower memory address
  +and the higher-numbered register receives the XLEN bits from the higher memory
  +address. This ensures correct va_arg operation when arguments are spilled to stack.
  +
  +NOTE: This memory-layout ordering for variadic arguments differs from the 
  +significance-based ordering used for named arguments and return values on 
  +big-endian systems.
Please let me know your thoughts about this.

big_afterfix_variadic.s.txt big_withoutfix_variadic.s.txt le_variadic.s.txt

ping @kito-cheng :) any thoughts on this? :)

djtodoro mentioned this pull request Jul 1, 2025

[RISCV] Add initial assembler/MC layer support for big-endian llvm/llvm-project#146534

Merged

djtodoro force-pushed the pr/riscv-be branch 2 times, most recently from 332088f to 97ccbce Compare August 12, 2025 08:20

jrtc27 reviewed Aug 13, 2025

View reviewed changes

pz9115 mentioned this pull request Aug 15, 2025

Enabling RISC-V Big-Endian Support in GCC Toolchain riscv-collab/riscv-gnu-toolchain#1740

Open

kito-cheng reviewed Sep 5, 2025

View reviewed changes

rui314 reviewed Sep 9, 2025

View reviewed changes

riscv-elf.adoc Outdated Show resolved Hide resolved

Add initial specification for Big Endian

ca43634

djtodoro force-pushed the pr/riscv-be branch from 97ccbce to ca43634 Compare September 10, 2025 13:54

kito-cheng requested a review from aswaterman September 10, 2025 14:07

kito-cheng mentioned this pull request Oct 8, 2025

ABI for _BitInt #419

Open

Add initial specification for Big Endian #470

Are you sure you want to change the base?

Add initial specification for Big Endian #470

Uh oh!

Conversation

djtodoro commented Jul 1, 2025

Uh oh!

rui314 commented Jul 1, 2025

Uh oh!

djtodoro commented Jul 4, 2025

Uh oh!

kito-cheng commented Aug 12, 2025

Uh oh!

djtodoro commented Aug 12, 2025

Uh oh!

jrtc27 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

kito-cheng Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kito-cheng Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

djtodoro commented Sep 5, 2025

Uh oh!

Uh oh!

kito-cheng commented Sep 10, 2025

Uh oh!

djtodoro commented Sep 18, 2025

Uh oh!

aswaterman commented Sep 19, 2025

Uh oh!

djtodoro commented Oct 13, 2025

Uh oh!

aswaterman commented Oct 14, 2025

Uh oh!

aswaterman commented Oct 14, 2025

Uh oh!

djtodoro commented Oct 14, 2025

Uh oh!

djtodoro commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants